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Abstract. For fat tailed distributions (i.e. those that decay slower than an exponential), 
large deviations not only become relatively likely, but the way in which they are realized 
changes dramatically: A finite fraction of the whole sample deviation is concentrated on 
a single variable: large deviations are not the accumulation of many small deviations, but 
rather they are dominated to a single large fluctuation. The regime of large deviations 
is separated from the regime of typical fluctuations by a phase transition where the 
symmetry between the points in the sample is spontaneously broken. 
i | This phenomenon has been discussed in the context of mass transport models in 

physics, where it takes the form of a condensation phase transition. Yet, the phenomenon 
is way more general. For example, in risk management of large portfolios, it suggests that 

S one should expect losses to concentrate on a single asset: when extremely bad things hap- 
pen, it is likely that there is a single factor on which bad luck concentrates. Along similar 
! 1 1 lines, one should expect that bubbles in financial markets do not gradually deflate, but 

rather burst abruptly and that in the most rainy day of a year, precipitation concentrate 
on a given spot. Analogously, when applied to biological evolution, we're lead to infer 
! t that, if fitness changes for individual mutations have a broad distribution, those large 

deviations that lead to better fit species are not likely to result from the accumulation of 
small positive mutations. Rather they are likely to arise from large rare jumps. 

I 

-a 

Large Deviation Theory (LDT) [lj has been developed to asses the probability of rare 
sample fluctuations but it is gradually being recognized as a central subject in statistical 
physics, information theory, statistical inference and learning [21 [3]. I discuss the extension 
of the theory to distributions with fat tails, showing that Large Deviations, in this case, 
are a simple realization of a second order phase transition with symmetry breaking. This 
phenomenon has been discussed already in simple models of mass transport [U [6] , where 
it manifests into a condensation transition. However, in spite of the fact that the theory 
has been derived in full detail [6], the more general relation of this phenomenon and LDT 

t— I has not been discussed, to the best of my knowledge. Besides giving the general argument 

and relating to works on particle models, I also hint at possible applications of the general 
idea, for the sake of illustrating its generality. In all cases discussed, the point I'd like to 
make is that, when individual shocks are fat tail distributed, it is natural to expect that 

• — large deviations concentrate on a single variable of the sample. 

We're given a sample X_ = {X\, . . . , Xjy}, where Xj are i.i.d. draws from a distribution 
Q(x) with support on X and N is large. We're interested in estimating the probability of 
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events 



I i=i 



where is a function that satisfies the law of large numbers, in the sense (e.g. weak) 
that, for any e > 



(1) lim P 

N-^oo 



1 N 

i=l 



> e )■ = 



where (g) = J dxg(x)Q(x) is the expected value of g{X) (the existence of (g) is a sufficient 
condition for this to hold [7]). 

If (g) [g, g + 5g] the event E is not typical, i.e. it has a vanishing probability as 
N — > oo. In the case where Xi have finite support we can invoke Sanov's theorem [21 [3], 
that states that the probability of E is asymptotically given by 



(2) P{E} 



-ND KL {P*\\Q) 



where Djcl(P\\Q) = JdxP(x)log[P(x)/Q(x)} is the Kullback-Leibler divergence, and 
P*(x) is the distribution that minimizes D KL (P\\Q) on all P's that satisfy £0 LDT 
then boils down to a problem of constrained optimization, that is solved by introducing 
the constraint Y2i9{Xi)/N G [g,g + <5g) with Lagrange multiplier in the optimization of 
E>kl{P\\Q)- For the case where 5g is infinitesimal, the result is given by 

(3) P*(x) = J^g(x)e-/^) 

where ^(/3) is a normalization constant and /3 is adjusted so that J dxP* (x)g(x) = g. These 
considerations generalize to the case when Q(x) decays at least as fast as an exponential 
for |x| —> oo. In both cases, the probability of E is expressed in terms of the pdf of g as 
P{E} = p(g)dg oc e~ NI ^dg where the rate function (also called Cramer's function) 1(g) 
is given by 

1(g) = - lim ±logP{E} = D KL (P*\\Q). 

N— >oo iV 

In practice, within these assumptions, 1(g) for i.i.d. samples can be computed from the 
function (j)(h) = log(e h9 ^ x ^) through the Legendre transform [21 [3]. 

This procedure does not only determines the probability of the large deviation E, but it 
also informs us of how untypical outcomes are "typically" realized: one can show [2] that 
the distribution of X, conditional on the occurrence of E, is given by P{X = x\E} = P*(x). 
So, for instance, a sample X_ exhibiting a large deviation in the mean x = ^ i Xi/N ^ (X) 



A more precise statement would imply the introduction of types. The interested reader is referred to 
[2] for a detailed derivation. Here I keep an informal style, in order not to clutter the gist of our argument 
in mathematical details. 
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can be thought of as a sequence drawn independently from P*(x) in Eq. ([3]), with g(x) = x 
and (5 fixed so as to match the average^} 

The rate function 1(g) has the property that it is positive and it vanishes for g = (g), 
which corresponds to the point h = 0. This description holds if <j)(h) exists at least for h in 
an open neighborhood of the origin, i.e. if the pdf of g(X) decays at least as an exponential 
for \x\ —> oo. What happens if this is not true? 

Without loss of generality we can restrict to the case g(x) = x in what follows. We 
shall call fat tailed distribution, any distribution Q(x) such that e hx Q(x) diverges for all 
h > 0, when x —> oo. For simplicity, we focus on the right tail of the pdf, and assume that 
Q(x) vanishes at least exponentially fast as x — > — oo. This includes stretched exponential 
distributions Q(x) ~ e -a-x a w jth a < \ anc i power law distributions Q(x) ~ Ax~^ for 
1. 

Let us consider the specific case Q(x) = A/(l + x) 7 for x G N, where A = 1/^(7). Let 
7 > 3 so that we are in the regime where both the law of large numbers and the Central 
Limit Theorem hold. It is clear that the normalizing constant 

in Eq. ^ is finite only for /3 > 0. The expected value of X under the distribution P*(x) 

E j3 [X} = --^logZ(P) 

is a decreasing function of /3 and E^ = q[X\ = (X). Hence, the recipe for large deviations 
works for all deviations where x < (X) as it is always possible to find a value of /3(x) such 
that Ep[X] = x < (X). In words, it is always possible to introduce a (exponential) cutoff 
to the distribution of X in order to reduce its expected value (see also j8] sect. 3.3.5). 

What about large deviations with x > (X)? It is easy to show that there are typical 
ways to realize large fluctuations where the excess of the average is taken up by a single 
variable. Indeed, consider all the samples X_ such that the sum over all but the largest 
variable Xi* is "typical", i.e. 



(5) 



N . 



< e, 



and Xi* = N(x — (X)) + (X). For each such sample the average takes the value x, hence 
the probability to observe x is at least 

e -Ni(x) < NQ(Xi*) = AN [N(x - (X)) + (X)H 7 



One could think of other distortions of the original pdf that match the average x in an i.i.d. sample, 
such as e.g. a translation of the pdf by the appropriate amount Q(x) — > Q(x) = Q(x + (X) — x). I.i.d. 
draws from Q(x) will yield typical samples which are very unlikely, in general, i.e. which have a probability 
much less than e~ nI( - x \ 
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because the event in Eq. ([5]) occurs with probability one, and there are N ways in which 
i* can be chosen. This means that the rate function I(x) vanishes for all x > (X) . In loose 
words, "democratic" ways to realize large deviations, where x is obtained as the average 
of i.i.d. draws from a modified distribution, are not typical. Large deviations are typically 
realized, for fat tailed pdf 's, by breaking the symmetry between the variables and having 
one of them take an extensive value (i.e. a value proportional to N). The fact that I(x) = 
for all x > (X) implies that I{x) has a singularity at x = (X) in the second derivative. 
This is the analogue of a second order phase transition in statistical physics, that is indeed 
accompanied by the spontaneous breaking of the symmetry between the data points in 
the sample. The same phenomenon occurs for stretched exponential distribution or for 
log-normal distributions, where again I(x) = for all x > (X), as illustrated in Fig. [I] (see 
caption) . 

1. Relation to condensation phase transitin in mass transport models 

The resemblance of this phenomenon with a phase transition in statistical physics is made 
even more evident by the following mapping of the above problem into an interacting gas 
problem. The discussion below closely follows [4J . Consider pN particles distributed in N 
boxes (or states) and let the energy be given by 

N 

H{n} = ^log(l + n;) 

i=i 

where rii is the number of particles in box i = 1, . . . , N. This is a gas of particles with 
weak attractive on-site interaction. At temperature T, the probability of configurations is 
given by the Boltzmann distribution 

i N 

^*^-" fa,/ ^nK + "<>- 1/T ] 

The Canonical partition function Z(T, p, N) is obtained summing the Boltzmann factor 
e -H{n}/T Qver a rj s t a tes with pN particles. This corresponds exactly to looking at large 
deviations where we constrain the average of to be n = p. The free energy is given 
by F = (H) — TS = —TlogZ, where S is the thermodynamic entropy. Hence, the rate 
function I(p) is exactly equal to T*(p)/N. 

A simpler way to study this system is to use the Grand Canonical ensemble instead of the 
Canonical one. Then we introduce a chemical potential p and introduce a statistical weight 
e~^l T for each particle. Then we can compute the Grand Canonical partition function 

oo 

Z(T, fi,N) = e'^ M/T Z{T, p = M/N, N) 

M=0 

removing the restriction on the density. The idea is that adjusting the chemical potential 
p it is possible to make the density p of the states that dominate Z vary, and if one inverts 
this relation one can compute Z(T, p(p), N) ~ Z(T, p,N). 
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The emphasis is different but the machinery and the concepts are exactly the same. The 
Grand Canonical formalism is biasing a priori probabilities (with /x = 0) on the distribution 
of particles in each box in such a way as to recover states with a given density as large 
deviations, i.e. as typical outcomes under the biased distribution. But, the trick only works 
for p > as Z is undefined for p < 0, i.e. for densities less than p(0), because p(p) is a 
decreasing function of p. 

In order to achieve states with a density p > p(0) the symmetry between the different 
boxes has to be broken. States where all boxes but one have typical occupation rtj « p(0) 
and the remaining one gathers all the excess N[p — p(p = 0)] particles, have the largest 
entropy S, hence these are those that are expected to be typically observed. 

Majumdar, Evans and Zia [6] generalize this derivation for generic energy of the form 
H{n} = ^2i = iF(rii) with F(n)/n — > as n — > oo and derive exact and asymptotic 
results for the rate function and for the pdf of the mass n,. Their results provide a 
detailed description of LDT for i.i.d. variables with fat tailed distribution, to which we 
refer the interested reader. Besides describing the case of different distributions, Ref. [6] 
also discusses the case where the law of large numbers does not hold, i.e. Q(x) ~ x~ 7 
with 7 < 2. In this case, the sum can be well approximated by the largest terms [9], 
and (pseudo) condensation occurs typically because large deviations spontaneously occur, 
as discussed also in [10J. The distribution of the maximum in the sample also exhibit 
interesting properties, for which we refer to Frisch and Sornette [12] report a similar 
symmetry breaking phenomenon for stretched exponential distributed variables, in the 
regime of extreme deviations, i.e. x — > oo. 

2. Applications and discussion 

As a possible realization of this phenomenon consider a financial market where a par- 
ticular stock is growing faster than what the fundamentals would suggest. Specifically, in 
the last period, the daily return Rt, t = 1, . . . , T has been such that the average return 

1 T 

R=-Y J Rt>f 

t=i 

where f is the daily return expected on the basis of fundamental analysis. This situation 
bears many similarities with the case we're discussing: In a zeroth order approximation, 
information efficiency (i.e. unpredictability of future returns) is captured by the assumption 
of independent returns. Moreover, distribution of daily returns has a power law distribution 
in the tails |13j . Over some time scale, longer than T, we expect the growth to revert to 
the fundamental rate f. But how is this transition going to be realized? Is the bubble 
going to deflate gradually or will it burst suddenly with a crash? 

In the current state of the market, given its empirical distribution of returns, a sample 
with average return f represents a large deviation in an i.i.d. sample with power law 
distribution. Therefore, such an atypical return will likely be realized with an anomalous 
drop in price. In brief, information efficiency (i.i.d. returns) and the empirical observation 
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Figure 1. I generate N = 100 random variables Xi from different dis- 
tributions: 1) exponential Q(x) = e~ x , x > 0, 2) stretched exponential 
Q(x) = e _v ^ /(2y/x), x > and 3) power law Q(x) = Ax~ 4 , x > 1. For 
each I generate M = 10 2 , 10 3 , 10 4 and 10 5 samples and pick the sample Z 
with the largest ^ Xj . This correspond to large deviations with proba- 
bility 1/M. Within each sample, I sort the variables in ascending order 
(i.e. Xi < Xi + \) and I compute the fraction of samples where Xi > Zi. 
This plot shows where the contribution to the large deviation concentrates: 
for the exponential distribution, the variables that, in the large deviation, 
are typically larger than in random samples are those in the middle range, 
whereas both for stretched exponential and for power law distributions, the 
variables that are significantly larger than in random samples are those in 
the tail. 



of power law distribution of returns, suggest that market crashes are the most likely way 
in which a financial market reverts from a bubble phase to its fundamentals. 

The present discussion clearly applies to risk management of large portfolios [H]. Indeed 
several risk measures are based on conditional losses in the tails. Since the loss of the 
portfolio is the sum of the individual losses on the assets, losses are clear examples of large 
deviations. When the pdf of individual losses is fat tailed, which is definitely the case 
for equities p2], then our discussion above suggests that typical losses will be realized in 
samples where one of the assets will default by a much larger amount than the others. For 
example, in a portfolio of N = 100 stocks with a pdf of returns that decays as |x| -4 |13j . 
the VaR at 1% is dominated by the largest drop, that carries more than 56% of the total 
loss (a percentage that goes up to 81% for VaR at 0.1% and 94% for 0.01%). Facing an 
event of this type, one may be lead to consider such "concentration of bad luck" on a single 
stock as abnormal, whereas this is precisely what we should expect, under the assumption 
that returns are weakly dependentrl 



Indeed returns in a financial market are correlated. Applying this analysis to daily returns of N = 41 
stocks in the Dow Jones index from the period 1980-2005 (see [T5] for a description of the data) I found 
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The problem of estimating credit risk is again of the same nature, as one focuses on 
events where the equity of a company, which is a sum on the different lines of business, 
becomes negative. Again if returns from the different investments is broadly distributed, 
and within the simplest approximation where they are considered independent, we expect 
default events to be characterized by a similar "pernicious concentration of bad luck" . 

Similarly, consider the most rainy day of the year in a particular region. Since the 
distribution of rainfall is fat tailed, it is likely that on the most rainy day precipitation 
concentrate on a particular spot. Indeed, using data from the South Pacific Rainfall data- 
base (PACRAIN) and selecting stations for which data from 1971 to 1992 are available 
(N = 58), I found that more than 39% of the total rainfall in the worst day was detected 
in a single station^} 

It is tempting to speculate on the possible application of these results to biological evo- 
lution. We think of evolutionary processes as occurring by the accumulation of mutations 
on the genome. Surviving individuals are those that achieve fitness changes that are large 
enough. The effects of a mutation on the fitness is very complex and in general non-linear, 
but neglecting epistatic effects, one can consider the fitness change as the sum of the effects 
of individual mutations, in a zeroth order approximation. Then if fitness changes of indi- 
vidual mutations have a broad distribution, one is lead to the conclusion that fit species 
are not likely to result from the accumulation of small positive mutations. Rather they 
are likely to arise from large fitness jumps, which is somewhat reminiscent of the notion of 
punctuated equilibria [16J as contrasted to phyletic gradualism. 

This discussion has no further pretense than to illustrate how the concentration of large 
deviations for fat tailed distribution may lead to counterintuitive results, showing that 
phenomena such as sharp changes or strongly uneven fluctuations can arise as a result of 
pure randomness, without having to invoke any specific mechanism. In the terminology of 
Ref. [T7j, Dragon Kings typically occur in large deviations with fat tailed distributions. 
Indeed, as in other more complex phenomena (e.g. phase transitions), pure randomness 
here manifest through a symmetry breaking phenomena, whereby the a priori equivalence of 
the data points in the sample is spontaneously broken. Given the widespread occurrence of 
fat tailed distributions, this is likely to be an important fact of chance to take into account. 

Acknowledgments I acknowledge useful and interesting discussion with Satya Majum- 
dar and Didier Sornette. 
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